Sampling Techniques for Kernel Methods
نویسندگان
چکیده
We propose randomized techniques for speeding up Kernel Principal Component Analysis on three levels: sampling and quantization of the Gram matrix in training, randomized rounding in evaluating the kernel expansions, and random projections in evaluating the kernel itself. In all three cases, we give sharp bounds on the accuracy of the obtained approximations. Rather intriguingly, all three techniques can be viewed as instantiations of the following idea: replace the kernel function k by a “randomized kernel” which behaves like k in expectation.
منابع مشابه
oASIS: Adaptive Column Sampling for Kernel Matrix Approximation
Computing with large kernel or similarity matrices is essential to many state-ofthe-art machine learning techniques in classification, clustering, and dimensionality reduction. The cost of forming and factoring these kernel matrices can become intractable for large datasets. We introduce an an adaptive column sampling technique called Accelerated Sequential Incoherence Selection (oASIS) that sa...
متن کاملComparison of correlation analysis techniques for irregularly sampled time series
Geoscientific measurements often provide time series with irregular time sampling, requiring either data reconstruction (interpolation) or sophisticated methods to handle irregular sampling. We compare the linear interpolation technique and different approaches for analyzing the correlation functions and persistence of irregularly sampled time series, as Lomb-Scargle Fourier transformation and ...
متن کاملMatrix Approximation for Large-scale Learning
Modern learning problems in computer vision, natural language processing, computational biology, and other areas are often based on large data sets of tens of thousands to millions of training instances. However, several standard learning algorithms, such as kernel-based algorithms, e.g., Support Vector Machines, Kernel Ridge Regression, Kernel PCA, do not easily scale to such orders of magnitu...
متن کاملLocal Adaptive Importance Sampling for Multivariate Densities with Strong Nonlinear Relationships
We consider adaptive importance sampling techniques which use kernel density estimates at each iteration as importance sampling functions. These can provide more nearly constant importance weights and more precise estimates of quantities of interest than the SIR algorithm when the initial importance sampling function is di use relative to the target. We propose a new method which adapts to the ...
متن کاملRecursive Sampling for the Nystrom Method
We give the first algorithm for kernel Nyström approximation that runs in linear time in the number of training points and is provably accurate for all kernel matrices, without dependence on regularity or incoherence conditions. The algorithm projects the kernel onto a set of s landmark points sampled by their ridge leverage scores, requiring just O(ns) kernel evaluations and O(ns) additional r...
متن کامل